Skip to content

SPARK-1795 - Add recursive directory file search to fileInputStream#537

Closed
patrickotoole wants to merge 3 commits into
apache:masterfrom
patrickotoole:recursive
Closed

SPARK-1795 - Add recursive directory file search to fileInputStream#537
patrickotoole wants to merge 3 commits into
apache:masterfrom
patrickotoole:recursive

Conversation

@patrickotoole

Copy link
Copy Markdown

Added recursive directory search to fileInputStream. Want spark to be able to find files in the subdirectories rather than just the parent directory.

@tdas

tdas commented Apr 24, 2014

Copy link
Copy Markdown
Contributor

Can you please add a JIRA for this and add the JIRA number in the title, like other PRs.

@tdas

tdas commented Apr 24, 2014

Copy link
Copy Markdown
Contributor

Also, please add a unit test for this usecase in the InputStreamsSuite

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an api change - please add default value to recursive

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have included a default value on the FileInputDStream but not on the API itself.

Wondering if we want to introduce default values to the more granular version of the API. Currently, it looks like the exposed API essentially has two versions for these methods -- one that assumes default values and one that exposes all the parameters of the DStream constructor.

Thoughts?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which version of spark can we get the API with support for nested directory streaming?

@patrickotoole patrickotoole changed the title Add recursive directory file search to fileInputStream SPARK-1795 - Add recursive directory file search to fileInputStream May 11, 2014

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input directory is already the lowest level of directory then it will not consider any files in it.
example:
Consider the following directory.
/a/file1.txt
/a/file2.txt and so on .
If the input directory is given as "/a", there will be no output.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call this like

val filePaths: Array[Path] = if (recursive)
recursiveListDirs(List(fs.getFileStatus(new Path(directoryPath)))).toArray

@SparkQA

SparkQA commented Sep 5, 2014

Copy link
Copy Markdown

Can one of the admins verify this patch?

@tdas

tdas commented Dec 24, 2014

Copy link
Copy Markdown
Contributor

@patrickotoole Sorry for this patch sitting around here for so long without any attention. Mind updating this patch to the latest code.

@srowen

srowen commented Jan 23, 2015

Copy link
Copy Markdown
Member

I suggest we close this in favor of #2765 since it implements recursion with max depth, merges, and was active more recently.

@AmplabJenkins

Copy link
Copy Markdown

Can one of the admins verify this patch?

@srowen

srowen commented Apr 27, 2015

Copy link
Copy Markdown
Member

Mind closing this PR?

@asfgit asfgit closed this in 8dee274 Apr 29, 2015
helenyugithub pushed a commit to helenyugithub/spark that referenced this pull request Aug 20, 2019
One-line code change which is the initial patch for [HADOOP-16248](https://issues.apache.org/jira/browse/HADOOP-16248). See internal ticket number 87611 for more context.
helenyugithub pushed a commit to helenyugithub/spark that referenced this pull request Aug 20, 2019
* [SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data (apache#531)
* [SPARK-27514][SQL] Skip collapsing windows with empty window expressions (apache#538)
* Bump hadoop to 2.9.2-palantir.5 (apache#537)
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants